1,388 research outputs found

    A Bayes interpretation of stacking for M-complete and M-open settings

    Get PDF
    In M-open problems where no true model can be conceptualized, it is common to back off from modeling and merely seek good prediction. Even in M-complete problems, taking a predictive approach can be very useful. Stacking is a model averaging procedure that gives a composite predictor by combining individual predictors from a list of models using weights that optimize a cross-validation criterion. We show that the stacking weights also asymptotically minimize a posterior expected loss. Hence we formally provide a Bayesian justification for cross-validation. Often the weights are constrained to be positive and sum to one. For greater generality, we omit the positivity constraint and relax the `sum to one' constraint. A key question is `What predictors should be in the average?' We first verify that the stacking error depends only on the span of the models. Then we propose using bootstrap samples from the data to generate empirical basis elements that can be used to form models. We use this in two computed examples to give stacking predictors that are (i) data driven, (ii) optimal with respect to the number of component predictors, and (iii) optimal with respect to the weight each predictor gets.Comment: 37 pages, 2 figure

    Prediction in several conventional contexts

    Get PDF
    We review predictive techniques from several traditional branches of statistics. Starting with prediction based on the normal model and on the empirical distribution function, we proceed to techniques for various forms of regression and classification. Then, we turn to time series, longitudinal data, and survival analysis. Our focus throughout is on the mechanics of prediction more than on the properties of predictors

    Desiderata for a Predictive Theory of Statistics

    Get PDF
    In many contexts the predictive validation of models or their associated prediction strategies is of greater importance than model identification which may be practically impossible. This is particularly so in fields involving complex or high dimensional data where model selection, or more generally predictor selection is the main focus of effort. This paper suggests a unified treatment for predictive analyses based on six \u27desiderata\u27. These desiderata are an effort to clarify what criteria a good predictive theory of statistics should satisfy

    Comment on Article by Sancetta

    Get PDF
    This paper makes a landmark contribution in three senses. First, it provides many results that are fundamentally important in their own right. I refer specifically to Theorems 3 and 8. Theorem 3 treats arbitrary loss functions by breaking the integral into two terms, one It, where a difference of losses is bounded and another, IIt, where a bound on the moments of a difference of losses must be used. (All notation here is the same as the author\u27s unless noted otherwise.) The treatment of these two terms reveals the role of the relative entropy and how the tails of the loss affect the risk, respectively. This is a proof that makes us wiser

    Comparing Bayes Model Averaging and Stacking When Model Approximation Error Cannot be Ignored

    Get PDF
    We compare Bayes Model Averaging, BMA, to a non-Bayes form of model averaging called stacking. In stacking, the weights are no longer posterior probabilities of models; they are obtained by a technique based on cross-validation. When the correct data generating model (DGM) is on the list of models under consideration BMA is never worse than stacking and often is demonstrably better, provided that the noise level is of order commensurate with the coefficients and explanatory variables. Here, however, we focus on the case that the correct DGM is not on the model list and may not be well approximated by the elements on the model list. We give a sequence of computed examples by choosing model lists and DGM’s to contrast the risk performance of stacking and BMA. In the first examples, the model lists are chosen to reflect geometric principles that should give good performance. In these cases, stacking typically outperforms BMA, sometimes by a wide margin. In the second set of examples we examine how stacking and BMA perform when the model list includes all subsets of a set of potential predictors. When we standardize the size of terms and coefficients in this setting, we find that BMA outperforms stacking when the deviant terms in the DGM ‘point’ in directions accommodated by the model list but that when the deviant term points outside the model list stacking seems to do better. Overall, our results suggest the stacking has better robustness properties than BMA in the most important settings

    Using the Bayesian Shtarkov solution for predictions

    Get PDF
    AbstractThe Bayes Shtarkov predictor can be defined and used for a variety of data sets that are exceedingly hard if not impossible to model in any detailed fashion. Indeed, this is the setting in which the derivation of the Shtarkov solution is most compelling. The computations show that anytime the numerical approximation to the Shtarkov solution is ‘reasonable’, it is better in terms of predictive error than a variety of other general predictive procedures. These include two forms of additive model as well as bagging or stacking with support vector machines, Nadaraya–Watson estimators, or draws from a Gaussian Process Prior

    Reference priors for exponential families with increasing dimension

    Get PDF
    In this article, we establish the asymptotic normality of the posterior distribution for the natural parameter in an exponential family based on independent and identically distributed data. The mode of convergence is expected Kullback-Leibler distance and the number of parameters p is increasing with the sample size n. Using this, we give an asymptotic expansion of the Shannon mutual information valid when p = pm increases at a sufficiently slow rate. The second term in the asymptotic expansion is the largest term that depends on the prior and can be optimized to give Jeffrey\u27s prior as the reference prior in the absence of nuisance parameters. In the presence of nuisance parameters, we find an analogous result for each fixed value of the nuisance parameter. In three examples, we determine the rates at which pn can be allowed to increase while still retaining asymptotic normality and the reference prior property

    Reference priors for exponential families with increasing dimension

    Get PDF
    In this article, we establish the asymptotic normality of the posterior distribution for the natural parameter in an exponential family based on independent and identically distributed data. The mode of convergence is expected Kullback-Leibler distance and the number of parameters p is increasing with the sample size n. Using this, we give an asymptotic expansion of the Shannon mutual information valid when p = pm increases at a sufficiently slow rate. The second term in the asymptotic expansion is the largest term that depends on the prior and can be optimized to give Jeffrey\u27s prior as the reference prior in the absence of nuisance parameters. In the presence of nuisance parameters, we find an analogous result for each fixed value of the nuisance parameter. In three examples, we determine the rates at which pn can be allowed to increase while still retaining asymptotic normality and the reference prior property

    J. K. Ghosh's contribution to statistics: A brief outline

    Get PDF
    Professor Jayanta Kumar Ghosh has contributed massively to various areas of Statistics over the last five decades. Here, we survey some of his most important contributions. In roughly chronological order, we discuss his major results in the areas of sequential analysis, foundations, asymptotics, and Bayesian inference. It is seen that he progressed from thinking about data points, to thinking about data summarization, to the limiting cases of data summarization in as they relate to parameter estimation, and then to more general aspects of modeling including prior and model selection.Comment: Published in at http://dx.doi.org/10.1214/074921708000000011 the IMS Collections (http://www.imstat.org/publications/imscollections.htm) by the Institute of Mathematical Statistics (http://www.imstat.org
    • …
    corecore